spurious dependency
Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders
Gene regulatory network inference (GRNI) aims to discover how genes causally regulate each other from gene expression data. It is well-known that statistical dependencies in observed data do not necessarily imply causation, as spurious dependencies may arise from, such as non-coding RNAs. Numerous GRNI methods have thus been proposed to address this confounding issue. However, dependencies may also result from --only cells satisfying certain survival or inclusion criteria are observed--while these selection-induced spurious dependencies are frequently overlooked in gene expression data analyses. In this work, we show that such selection is ubiquitous and, when ignored or conflated with true regulations, can lead to flawed causal interpretation and misguided intervention recommendations. To address this challenge, a fundamental question arises: can we distinguish dependencies due to regulation, confounding, and crucially, selection? We show that gene perturbations offer a simple yet effective answer: selection-induced dependencies are under perturbation, while those from regulation or confounding are not. Building on this motivation, we propose GISL (Gene regulatory network Inference in the presence of Selection bias and Latent confounders), a principled algorithm that leverages perturbation data to uncover both true gene regulatory relations and non-regulatory mechanisms of selection and confounding up to the equivalence class. Experiments on synthetic and real-world gene expression data demonstrate the effectiveness of our method.
Generative multitask learning mitigates target-causing confounding
We propose generative multitask learning (GMTL), a simple and scalable approach to causal machine learning in the multitask setting. Our approach makes a minor change to the conventional multitask inference objective, and improves robustness to target shift. Since GMTL only modifies the inference objective, it can be used with existing multitask learning methods without requiring additional training. The improvement in robustness comes from mitigating unobserved confounders that cause the targets, but not the input. We refer to them as \emph{target-causing confounders}. These confounders induce spurious dependencies between the input and targets. This poses a problem for conventional multitask learning, due to its assumption that the targets are conditionally independent given the input.
Generative multitask learning mitigates target-causing confounding
We propose generative multitask learning (GMTL), a simple and scalable approach to causal machine learning in the multitask setting. Our approach makes a minor change to the conventional multitask inference objective, and improves robustness to target shift. Since GMTL only modifies the inference objective, it can be used with existing multitask learning methods without requiring additional training. The improvement in robustness comes from mitigating unobserved confounders that cause the targets, but not the input. We refer to them as \emph{target-causing confounders}.
Generative multitask learning mitigates target-causing confounding
Makino, Taro, Geras, Krzysztof, Cho, Kyunghyun
We propose a simple and scalable approach to causal representation learning for multitask learning. Our approach requires minimal modification to existing ML systems, and improves robustness to prior probability shift. The improvement comes from mitigating unobserved confounders that cause the targets, but not the input. We refer to them as target-causing confounders. These confounders induce spurious dependencies between the input and targets. This poses a problem for the conventional approach to multitask learning, due to its assumption that the targets are conditionally independent given the input. Our proposed approach takes into account the dependency between the targets in order to alleviate target-causing confounding. All that is required in addition to usual practice is to estimate the joint distribution of the targets to switch from discriminative to generative classification, and to predict all targets jointly. Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.